Brain correlates of speech perception in schizophrenia patients with and without auditory hallucinations

The experience of auditory verbal hallucinations (AVH, “hearing voices”) in schizophrenia has been found to be associated with reduced auditory cortex activation during perception of real auditory stimuli like tones and speech. We re-examined this finding using 46 patients with schizophrenia (23 with frequent AVH and 23 hallucination-free), who underwent fMRI scanning while they heard words, sentences and reversed speech. Twenty-five matched healthy controls were also examined. Perception of words, sentences and reversed speech all elicited activation of the bilateral superior temporal cortex, the inferior and lateral prefrontal cortex, the inferior parietal cortex and the supplementary motor area in the patients and the healthy controls. During the sentence and reversed speech conditions, the schizophrenia patients as a group showed reduced activation in the left primary auditory cortex (Heschl’s gyrus) relative to the healthy controls. No differences were found between the patients with and without hallucinations in any condition. This study therefore fails to support previous findings that experience of AVH attenuates speech-perception-related brain activations in the auditory cortex. At the same time, it suggests that schizophrenia patients, regardless of presence of AVH, show reduced activation in the primary auditory cortex during speech perception, a finding which could reflect an early information processing deficit in the disorder.


Introduction
Auditory verbal hallucinations (AVH) are a core, often distressing, symptom of schizophrenia which are estimated to occur in around 70% of patients [1]. Their clinical features are well established: they may be single or multiple, are often derogatory but less commonly neutral or praising, and they can be experienced as originating inside and outside the head (or both) [1,2]. Nevertheless, despite research stretching back over more than half a century [3,4], their underlying basis or bases remain uncertain.
Theoretical approaches to AVH can be classified into two broad categories. One, often termed the 'neurological' model, implicates primarily perceptual mechanisms [5][6][7][8]. This approach can be traced back to Kraepelin [9] who proposed that AVH were due to pathological ('irritative') neuronal activity in the auditory cortex. Modern versions of the theory, however, tend to be more complicated, proposing not only 'bottom-up' perceptual mechanisms, but also 'top-down' cognitive influences which act to give AVH their specific characteristics, such as being the voices of family, friends, or people engaged in supposed conspiracies against the patient [8]. The other, 'cognitive' model, maintains that AVH are the result of cognitive activity that is for unknown reasons misinterpreted as perceptual, for example, inner speech that fails to be labeled as such, or vivid, intrusive memories [10,11] (for a review see Jones [6]).
Functional imaging has played a key role in the investigation of AVH. Several studies have employed the so-called symptom capture paradigm, which compares brain activity when patients hear a voice (which they signal by a button press) to periods where they do not experience them. Some of these studies, in line with the perceptual model, have found AVH-related activation in the superior temporal cortex, which contains the primary auditory cortex (Heschl's gyrus) [12][13][14]. Others, however, have found little or no evidence of temporal cortical activation [15][16][17].
If the mechanisms underlying AVH involve aberrant perceptual activity, it might also be expected that experiencing them will have consequences for brain activity in response to real sounds or speech that are perceived at the same time. While the simultaneous experience of AVH and real auditory stimuli could in principle lead to either greater than normal activation in the auditory cortex (due to summation of activations), or to reduced activation (due to competition for processing resources), in practice the latter has invariably been assumed [8]. This proposal has so far been examined in three studies. David et al. [18] found attenuation to the point of near extinction of activations to real speech in the auditory cortex in a single case study of a continuously hallucinating patient; when his hallucinations improved, activation to real speech sounds increased. Later, the same group [19] examined 8 patients with schizophrenia with a history of hallucinations and 7 without such a history while they listened to external speech. Relative to 8 healthy comparison subjects, both patient groups showed reduced activation in the left superior temporal gyrus and the auditory association cortex. In 7 patients whose hallucinations subsequently improved, activation in the left superior temporal gyrus and the right middle temporal gyrus increased. Plaze et al. [20] examined 15 patients with daily AVH who underwent fMRI while listening to spoken sentences. Whole brain correlational analysis revealed an inverse association between scores on one of two rating scales for AVH in the posterior part of the left superior temporal gyrus, and a similar inverse association was found on the other scale using region of interest (ROI) analysis.
The aim of the present study was to further examine, using fMRI and whole-brain, voxelbased analysis, whether experience of AVH in schizophrenia is associated with changed auditory cortex responses to perception of real speech. We examined groups of patients with schizophrenia with and without current AVH, also requiring that the frequency was high in the former group. We employed a speech perception paradigm incorporating three different conditions, hearing words, sentences, and reversed speech, to determine whether any potential effects on auditory cortex activations might be related to phonetic-acoustic properties of speech, its lexical structure, or presence of sentential meaning. We additionally examined the findings in the whole group of schizophrenia patients compared to matched healthy controls.

Results
The final sample consisted of 23 AVH+ and 23 AVH-patients (from initial samples of 29 and 36, respectively, see Methods), and 25 healthy controls. As shown in Table 1, the three groups were matched for age, sex and estimated premorbid IQ, and the two patient groups did not differ significantly in current IQ. In the AVH+ group, hallucination frequency, as measured over a 5-minute period (see Methods), ranged from 0 to 101 instances (mean = 16.19, SD = 22.54, median = 8).

Imaging results
Perception of words (words vs baseline). In this contrast, the healthy controls showed a pattern of activation involving prominently the temporal lobe cortex, including the bilateral superior and middle temporal cortex (see Fig 1A; S1 Table in S1 File). They also showed activation in the inferior and middle lateral frontal cortex bilaterally, portions of the superior frontal cortex, the insula, pre-and post-central gyri and supplementary motor area. Activation was also seen in the left inferior parietal cortex and the bilateral supramarginal gyri, amygdala, hippocampus/parahippocampus, basal ganglia and thalamus, all bilaterally, and the cerebellum.
De-activations were seen in the medial prefrontal cortex, the precuneus and cuneus, and portions of the superior and inferior parietal and posterior temporal cortices, bilaterally. De-activation was also seen the posterior portion of the hippocampus and parahippocampal gyrus and the fusiform and occipital cortex, bilaterally (see S1 Fig and S2 Table in S1 File).
The pattern was broadly similar in the schizophrenia patients. However, in the combined group the de-activations appeared slightly less extensive (see Fig 1B; S1 Fig and S1 Table in S1 File).
Comparison between the healthy controls and the combined group of patients showed no regions of significant differences. There were also no clusters of significant difference between the AVH+ and AVH-patients (see S2 Fig and S1 Table in S1 File for mean activation maps for both patient groups).
Perception of sentences (sentences vs baseline). In the healthy controls, hearing sentences elicited extensive activation in the superior and middle temporal gyri, the superior and inferior prefrontal cortex, the pre-and post-central gyri and the supplementary motor area, as well as the left angular and bilateral supramarginal gyri. Activation was also seen in the bilateral amygdala, hippocampus/parahippocampus, basal ganglia, thalamus, and cerebellum ( Fig  2A; S3 Table in S1 File). De-activations were seen in the anterior cingulate and the lateral and medial superior prefrontal cortex. Additional areas of de-activation were observed in the bilateral precuneus and superior parietal cortex, the parahippocampal gyrus, the fusiform and occipital cortex, and the right caudate and putamen (S4 Table and Table in S1 File). Group comparison revealed a cluster of significantly reduced activation in schizophrenia patients in the left superior temporal cortex (Fig 2C), located in Heschl's gyrus (MNI coordinates x = -56, y = -2, z = 2; Z = 4.03; cluster size = 374 voxels; p = 0.032). There was also relatively increased activation in the patients affecting portions of the occipital cortex (MNI coordinates x = -38, y = -76, z = 26; Z = 3.77; cluster size = 1354 voxels; p < 0.001), the precuneus (MNI coordinates x = 8, y = -48, z = 54; Z = 4.01; cluster size = 779 voxels; p < 0.001) and the lingual gyrus (MNI coordinates x = -26, y = -64, z = -6; Z = 3.72; cluster size = 503 voxels; p = 0.006). As indicated in Fig 2D, this reflected failure of de-activation in the patients. As in the words contrast, there were no clusters of significant difference between the AVH+ and AVH-groups.
Perception of reversed speech (reversed sentences vs baseline). The findings here were similar to those in the sentences condition. The healthy controls showed activation in the superior, middle, and inferior temporal cortex bilaterally. Activation was also seen in inferior and superior lateral frontal regions and in the pre-and post-central gyri, parts of the inferior and superior parietal cortex, and subcortically in the amygdala, hippocampus and parahippocampus, basal ganglia (on the right only in the caudate) and left thalamus. Regions in the inferior occipital cortex also showed activation (see Fig 3A; S5 Table in S1 File). De-activations were observed in middle and inferior temporal regions bilaterally. The bilateral fusiform gyrus and parts of the hippocampus, parahippocampus, precuneus and cuneus were also de-activated, extending into occipital regions and left superior parietal cortex (S6 Table in S1 File).
The activation and de-activation maps were similar in the combined group of schizophrenia patients (see Fig 3B; S1 Fig and S5 Table in S1 File). Comparison between patients and controls revealed a cluster of reduced activation in the left auditory cortex, involving Heschl's gyrus and extending into the left putamen (MNI coordinates x = -56, y = -2, z = 2; Z = 4.08; cluster size = 1053 voxels; p < 0.001). No group differences were observed for regions of deactivation. As previously, there were no clusters of significant differences between the AVH + and AVH-patients.

Discussion
This study found that frequently hallucinating and non-hallucinating schizophrenia patients failed to show activation differences when they heard words, sentences or nonsense (i.e. reversed) speech. Otherwise, we found that, as a group, the patients with schizophrenia showed reduced activation in the left superior temporal cortex in the sentences and reversed sentences conditions, though not when listening to words alone.
Our study fails to replicate a single case study and two group studies [18][19][20] which previously all found evidence for reduced activation in hallucinating patients during perception of real speech. One further study has also been considered to provide support for this finding. Ford et al. [21] examined a large group of patients with schizophrenia or schizoaffective disorder (N = 109) while they performed auditory oddball task, i.e. they heard one of two tones and had to respond by button press to one of them, which was presented 5% of the time. Compared to 111 healthy controls, the patients showed lower activation in response to the low frequency targets compared to the high frequency tones in four ROIs placed in the auditory cortex (primary auditory cortex, BA41; secondary auditory cortex, BA42; auditory association cortex, BA22; and middle temporal gyrus, BA21). When the patient group was divided into hallucinators (N = 66) and non-hallucinators (N = 40) based on having experienced AVH in the previous week, significantly lower activation was found in the hallucinators in the left primary auditory cortex ROI. Reduced activation in the hallucinators was also seen in bilateral ROIs placed in the visual cortex. Clearly, this study had less than robust findings with respect to AVH, given that reduced activation was only seen in one of four auditory cortex regions examined (and also in a non-auditory region, the visual cortex). It should also be noted that the task used did not measure activations in response to auditory stimuli (tones) per se, but rather the difference in activation between commonly and uncommonly presented tones.
Our findings also disagree with a meta-analysis of 11 PET and fMRI studies by Kompus et al. [22]. They found that perception of verbal and non-verbal auditory stimuli in hallucinating schizophrenia patients was associated with reduced activation in the left superior temporal gyrus, among several other areas. However, some of the studies included in this meta-analysis compared patients with moderate or high scores on positive symptoms generally, not high and low scores on AVH specifically. Additionally, activations in the hallucinating/highly symptomatic patients were compared with activations in healthy controls, not non-hallucinating patients, leaving open the possibility that the reduced activation to auditory stimuli found was a function of having schizophrenia generally, not the symptom of AVH specifically.
We in fact found evidence pointing to this latter possibility in our study. The combined group of patients with schizophrenia showed reduced activation in the left primary auditory cortex in two of the three conditions employed, sentences and reversed speech. Although reduced activation in response to a variety of tasks is a common finding in schizophrenia, for example during cognitive control tasks [23], reward anticipation and delivery [24] and emotion processing [25], whether this finding extends to basic auditory perception is uncertainthere have been few relevant studies, and their findings have been conflicting. For example, Woodruff et al. [19] found reduced activation in the left superior temporal gyrus and the auditory association cortex during perception of external speech both schizophrenia patients with and without hallucinations. In a study employing nonverbal auditory stimuli (laughing and crying), Kang et al. [26] found that both 14 hallucinators and 14 non-hallucinators showed areas of reduced activation compared to 28 healthy controls; however, the sites where this was seen did not include the auditory cortex. Braus et al. [27] simultaneously presented visual (a checkerboard) and auditory (drumbeats) stimuli to 12 first episode patients and 11 healthy controls. The patients showed reduced activation in the right thalamus, the right prefrontal cortex, and regions of the parietal lobe, though not the auditory cortex, bilaterally.
We found that the patients with schizophrenia as a group showed failure of de-activation in one of the three conditions used, sentences vs baseline; this was seen in the precuneus, the left superior occipital cortex and the left lingual gyrus. Failure of de-activation is a well-established finding in schizophrenia, and has been found in association with a variety of tasks (for a review see Hu et al. [28]). It is widely considered to reflect dysfunction of the default mode network [29,30], a set of brain regions, including the medial frontal cortex, the posterior cingulate cortex/precuneus and the angular gyrus, that are normally active at rest but which de-activate during performance of a wide range of attention-demanding tasks. While failure of de-activation in schizophrenia has been most commonly been found to affect the medial frontal cortex, it has also been found in the posterior cingulate cortex/precuneus in some studies [31,32]. What distinguishes the sentence condition is presence of sentential meaning, i.e., the fact that thoughts are expressed. One possible speculation here is that failure to de-activate in this condition is due to similar mental processes being involved in the brain's default mode, and schizophrenia involves a failure to segregate between the two networks.
While we found no activation differences between schizophrenia patients with and without AVH using a speech perception task, studies using another auditory perception-related task, mismatch negativity (MMN) have found some evidence of brain functional changes related to this symptom. MMN refers to a wave of negativity that occurs when a sequence of regular auditory stimuli is interrupted by a tone that differs from the remaining along one or more dimensions, for example pitch or duration. MMN amplitude is known to be reduced in patients with schizophrenia [33] and Fisher and co-workers found that indices of MMN attenuation were correlated with hallucination scores [34,35], and to a trait measure of hallucination proneness [36]. However, numbers were small in these studies (N = 10-12), and the association was not found in another study by the same group [37].
In conclusion, we found no evidence to support the view that presence of auditory hallucinations in patients with schizophrenia reduces activations to real speech, by a presumptive mechanism involving competition for processing resources. A limitation to this negative finding needs to be noted, in that our experimental design was not fine-grained enough to permit measurement of activation to words, sentences and reversed speech at times when AVH were experienced simultaneously with them. A related limitation was that there was considerable variation in the frequency of hallucinations in the AVH+ group, with rates ranging from 0 to over 100 over a five-minute period. It should also be noted that we only examined activations in response to speech stimuli, and it is possible that activations might be different in response to other auditory stimuli such as tones. Despite the negative findings for AVH, we found that the schizophrenia patients as a group showed reduced activation in the left primary auditory cortex during some versions of the auditory perception task. Interpretation of this latter finding is also limited by the above considerations and also by the fact that it emerged unexpectedly, rather than the study being specifically designed to address this possibility. Nevertheless, if genuine, it could possibly point to the presence of an early information processing deficit in the disorder, as argued for by Javitt [38], based on behavioural (e.g. tone matching) and neurophysiological (e.g. event related potential) studies.

Participants
Sixty-five patients meeting DSM-5 criteria for schizophrenia or schizoaffective disorder were initially recruited from four psychiatric hospitals in Barcelona (Benito Menni CASM, Hospital Sagrat Cor de Martorell, Sant Rafael Hospital, Hospital de la Mercè). They were selected based on either experiencing frequent hallucinations (as defined below) (AVH+, N = 29), or having been hallucination-free for at least 6 months (AVH-, N = 36). Diagnoses were made using the Structured Clinical Interview for DSM Disorders (SCID) [39]. Premorbid IQ was estimated using the Word Accentuation Test (Test de Acentuación de Palabras, TAP) [40,41]. This test requires pronunciation of low-frequency Spanish words whose accents have been removed and is conceptually similar to the English-language National Adult Reading Test (NART) [42] and the Wide Ranging Achievement Test (WRAT) [43]. Current IQ was measured using 4 subtests of the WAIS III (Vocabulary, Similarities, Matrix reasoning and Block design).
Patients were excluded if they (a) were younger than 18 or older than 65, (b) had a history of brain trauma or neurological disease or (c) had shown alcohol/substance abuse/dependence within 12 months prior to participation. Social use of alcohol was permitted, as was non-habitual use of cannabis. Electroconvulsive therapy in the past 6 months was also an exclusion criterion. All participants were right-handed and were taking antipsychotic medication. Based on reasons including failure to complete scanning, excessive head motion, IQ < 70, not being able to recall the task characteristics after scanning, not being completely hallucination-free in the AVHgroup, and matching considerations, 23 AVH+ and 23 AVH-patients were finally included.
The control group consisted of 25 healthy individuals, selected to be matched to the two patient groups for age sex and estimated premorbid IQ. They were recruited from non-clinical staff working in the hospitals, their relatives and acquaintances, plus independent sources in the community. They met the same exclusion criteria as the patients, and they were also interviewed using the SCID to exclude current and past psychiatric disorders. They were questioned and excluded if they reported a history of treatment with psychotropic medication beyond non-habitual use of night sedation. Controls were also excluded if they reported a history of major psychiatric disorder in a first-degree relative.
All participants gave written informed consent prior to participation. All the study procedures had been previously approved by the Research Ethics Committee FIDMAG Sisters Hospitallers (Comité de Ética de la Investigación de FIDMAG Hermanas Hospitalarias) and complied with its ethical standards on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. Healthy controls received a gift-card as a compensation for their participation in the study.

Clinical and cognitive assessment
AVH severity was assessed with the Psychotic Symptom Rating Scale (PSYRATS [44]), auditory hallucinations subscale (PSYRATS-H). This subscale consists of a semi-structured interview with 11 items referring to frequency, duration, controllability, loudness, location; severity and intensity of distress; amount and degree of negative content; beliefs about the origin of voices; and disruption caused by the AVHs. The PANSS [45] was used to rate positive and negative psychotic symptoms. Overall severity of illness was assessed with the Clinical Global Impression [46] and the Global Assessment of Functioning scale (GAF) [47]. All assessments took place within one week of the scanning session.
Patients in the AVH+ group were required to report hearing AVHs at least once a day (score of 2 in item 1, frequency of voices, in the PSYRATS). To obtain a more accurate measurement of hallucination frequency, they were also asked to remain silent in a quiet environment for 5 minutes and tap on the table every time they heard a voice.

Speech perception task
While in the scanner, participants performed an auditory speech perception task with three conditions of interest: spoken words (words), spoken sentences (sentences), and spoken unintelligible reversed speech (reversed). These auditory stimuli were presented in a block-design fashion, with six blocks per condition presented in random order, each lasting 26 seconds with a 2-seconds inter-block interval. Every three stimulation blocks, a low-level baseline block was presented consisting of white noise, with the same duration as the speech blocks. The session lasted a total of 11 minutes and 22 seconds.
In the words condition, stimuli consisted of unrelated neutral word lists (nouns, verbs, and adjectives). In each block, a list of 22 to 24 words was presented. In the sentences condition, a list of 8 unrelated sentences with neutral content was presented in each block. Nouns, verbs, and adjectives from the sentence lists were matched with the ones in the word lists in terms of valence, arousal, and frequency of use, according to normative data from Ferré et al. [48], Guasch et al. [49], Hinojosa et al. [50] and in the EsPal database [51]. Stimuli in the reversed condition consisted of 8 acoustically reversed sentences per block. Stimuli were presented through MRI-compatible headphones (VisuaStim Digital, Resonance Technology, Northridge, CA, USA). To maintain visual stimulation constant and similar for all participants, the task was performed with eyes open while looking at a gray screen shown through MRI-compatible goggles (VisuaStim Digital).
Participants were instructed to remain silent and listen carefully to the recordings during the task. To ensure they had been attending to the presented stimuli, a brief questionnaire was administered immediately after scanning about the type of content heard during the task. Participants also reported their level of attention and, in the case of AVH+ patients, the frequency of hallucinations during the task. Participants who reported not attending to the task or were unable to recall the task characteristics (i.e., they reported hearing something different to word lists, sentence lists and reversed speech, or reported not hearing one of these type of stimuli) were excluded from the analyses.

Image preprocessing and analysis
Preprocessing and analysis were carried out with the FEAT module included in the FSL (FMRIB Software Library) software [52]. The first 10 seconds (5 volumes) of the sequence, corresponding to signal stabilization, were discarded. Preprocessing included motion correction (using the MCFLIRT algorithm), co-registration and normalization to a common stereotactic space (MNI, Montreal Neurological Institute template). Brain extraction was first applied to the structural image and the functional sequence was registered to it. Then the structural image was registered to the standard template. These two transformations were used to finally register the functional sequence to the standard space. Before group analyses, normalized images were spatially filtered with a Gaussian filter (FWHM = 5mm). To minimize unwanted movementrelated effects, individuals with an estimated maximum absolute movement > 3.0mm or an average absolute movement > 0.3mm were excluded from the study.
Statistical analysis was performed by means of a General Linear Model (GLM) approach. At the first level (within-subject) analysis, separate regressors were defined for the words, sentences, and reversed conditions (white noise blocks were not modeled and thus acted as the implicit baseline). Motion parameters obtained from realignment were also included as nuisance covariates. GLMs were fitted to generate individual activation maps for each condition of interest against the white noise baseline. Second level (group) analyses were performed within the FEAT module by means of mixed-effects GLMs [53], to obtain mean activation maps for each group with one-sample t-tests. Two-sample t-tests were performed to compare the patient and control groups, on the one hand, and the two patient subgroups (AVH+ vs. AVH-), on the other. Group analyses were carried out with sex, age, and pre-morbid IQ as nuisance covariates. All statistical tests were carried out at the cluster level with a corrected p < 0.05 using Gaussian random field methods, with a cluster-forming threshold of z > 2.3. To identify the regions activated by the task or showing differences between groups, we used the MNI coordinates on the AAL atlas (Anatomical Automatic Labeling) [54].